NYC Rat Sightings vs Subway Entrances - Animated MapΒΆ
This notebook presents an animated geospatial analysis of NYC rat sightings in relation to subway entrances. Motivated by the rise in reported sightings during the Covid-19 pandemic, the project explores whether rat movement patterns align with urban transit infrastructure. The analysis combines open data sources, nearest-neighbor modeling, and interactive visualization to illustrate how public datasets can be leveraged to study behavioral and environmental dynamics in a city.
Key Skills Demonstrated:
- Data ingestion from public APIs (NYC Open Data via Socrata)
- Data cleaning, preprocessing, and temporal aggregation (pandas)
- Spatial analysis using nearest-neighbor search (scikit-learn KDTree)
- Geospatial visualization and animation (Plotly/Mapbox)
- Reproducible, interactive reporting (Jupyter Notebook)
β οΈ Tokens/Files:
- Socrata app token is optional but recommended.
- Mapbox is optional β if you don't have a token, the notebook uses a USGS raster tile as a free basemap layer.
InΒ [Β ]:
# (Optional) Install dependencies. Uncomment if needed.
# %pip install pandas sodapy plotly numpy scikit-learn
InΒ [1]:
import os
import numpy as np
import pandas as pd
from sodapy import Socrata
import plotly.express as px
from sklearn.neighbors import KDTree
# ---- Configuration ----
SOCRATA_DOMAIN = "data.cityofnewyork.us"
SOCRATA_APP_TOKEN = os.getenv("SOCRATA_APP_TOKEN", None) # set in env or paste below (not recommended to commit)
# Local CSV fallback for subway entrances (if you have it):
LOCAL_MTA_CSV = "NYC_Transit_Subway_Entrance_And_Exit_Data.csv"
# If you don't have the local CSV, you can use a public URL for the same dataset:
# (You can replace with the official Open Data CSV export link if preferred.)
MTA_URL = "https://data.ny.gov/resource/i9wp-a4ja.csv?$query=SELECT%20division%2C%20line%2C%20borough%2C%20stop_name%2C%20complex_id%2C%20constituent_station_name%2C%20station_id%2C%20gtfs_stop_id%2C%20daytime_routes%2C%20entrance_type%2C%20entry_allowed%2C%20exit_allowed%2C%20entrance_latitude%2C%20entrance_longitude%2C%20entrance_georeference%20ORDER%20BY%20line%20ASC"
# Optional: Mapbox token (for nicer basemap). If not provided, a USGS raster layer is used.
MAPBOX_TOKEN = os.getenv("MAPBOX_TOKEN", None) # or set to a string
if MAPBOX_TOKEN:
px.set_mapbox_access_token(MAPBOX_TOKEN)
InΒ [3]:
# ---- 1) Download rat sightings (descriptor = 'Rat Sighting') ----
client = Socrata(SOCRATA_DOMAIN, SOCRATA_APP_TOKEN, timeout=480)
results = client.get("erm2-nwe9",
where="descriptor = 'Rat Sighting'",
limit=200000) # adjust limit if needed
rats = pd.DataFrame.from_records(results)
# Keep rows with lat/lon
rats = rats[rats['latitude'].notna() & rats['longitude'].notna()].copy()
# Parse created_date
rats["created_date"] = pd.to_datetime(rats["created_date"], errors="coerce")
rats = rats.dropna(subset=["created_date"])
rats = rats.sort_values("created_date")
# Build Month Year string for animation
rats["Date"] = rats["created_date"].dt.strftime("%B %Y")
# Ensure numeric lat/lon
rats["latitude"] = pd.to_numeric(rats["latitude"], errors="coerce")
rats["longitude"] = pd.to_numeric(rats["longitude"], errors="coerce")
rats = rats.dropna(subset=["latitude", "longitude"]).reset_index(drop=True)
rats.head(3)
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Out[3]:
| unique_key | created_date | agency | agency_name | complaint_type | descriptor | location_type | incident_zip | incident_address | street_name | ... | park_borough | latitude | longitude | location | closed_date | facility_type | resolution_description | resolution_action_updated_date | due_date | Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 15633803 | 2010-01-01 08:29:58 | DOHMH | Department of Health and Mental Hygiene | Rodent | Rat Sighting | 3+ Family Apt. Building | 11206 | 202 PULASKI STREET | PULASKI STREET | ... | BROOKLYN | 40.692989 | -73.943771 | {'latitude': '40.69298896011082', 'longitude':... | NaN | N/A | Please contact the Department of Health and Me... | 2010-01-01T08:37:34.000 | 2010-01-31T08:29:58.000 | January 2010 |
| 1 | 15633054 | 2010-01-01 11:20:45 | DOHMH | Department of Health and Mental Hygiene | Rodent | Rat Sighting | 1-2 Family Dwelling | 11365 | 59-13 159 STREET | 159 STREET | ... | QUEENS | 40.739983 | -73.809299 | {'latitude': '40.73998332248969', 'longitude':... | NaN | N/A | Please contact the Department of Health and Me... | 2010-01-01T11:30:23.000 | 2010-01-31T11:20:45.000 | January 2010 |
| 2 | 15633896 | 2010-01-01 12:11:51 | DOHMH | Department of Health and Mental Hygiene | Rodent | Rat Sighting | 3+ Family Apt. Building | 10027 | 317 WEST 120 STREET | WEST 120 STREET | ... | MANHATTAN | 40.807367 | -73.954388 | {'latitude': '40.807367287308594', 'longitude'... | NaN | N/A | Please contact the Department of Health and Me... | 2010-01-01T12:18:09.000 | 2010-01-31T12:11:51.000 | January 2010 |
3 rows Γ 35 columns
InΒ [5]:
# ---- 2) Load subway entrances ----
# Try local CSV first, otherwise fetch via URL
if os.path.exists(LOCAL_MTA_CSV):
mta = pd.read_csv(LOCAL_MTA_CSV)
else:
try:
mta = pd.read_csv(MTA_URL)
except Exception as e:
raise RuntimeError("Could not load MTA entrances. Provide LOCAL_MTA_CSV or check MTA_URL.") from e
# Standardize column names to locate lat/lon columns
cols = {c.lower(): c for c in mta.columns}
lat_col = None
lon_col = None
# Common column name possibilities
lat_col = cols["entrance_latitude"]
lon_col = cols["entrance_longitude"]
if lat_col is None or lon_col is None:
raise ValueError("Could not find latitude/longitude columns in the MTA dataset.")
mta = mta.dropna(subset=[lat_col, lon_col]).copy()
mta_latlon = mta[[lat_col, lon_col]].astype(float).values
mta.head(3)
Out[5]:
| division | line | borough | stop_name | complex_id | constituent_station_name | station_id | gtfs_stop_id | daytime_routes | entrance_type | entry_allowed | exit_allowed | entrance_latitude | entrance_longitude | entrance_georeference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BMT | 4th Av | B | Atlantic Av-Barclays Ctr | 617 | Atlantic Av-Barclays Ctr | 27 | R31 | 2 3 4 5 B D N Q R | Stair | YES | YES | 40.683905 | -73.978879 | POINT (-73.978879 40.683905) |
| 1 | BMT | 4th Av | B | Atlantic Av-Barclays Ctr | 617 | Atlantic Av-Barclays Ctr | 27 | R31 | 2 3 4 5 B D N Q R | Elevator | YES | YES | 40.683805 | -73.978487 | POINT (-73.978487 40.683805) |
| 2 | BMT | 4th Av | B | Atlantic Av-Barclays Ctr | 617 | Atlantic Av-Barclays Ctr | 27 | R31 | 2 3 4 5 B D N Q R | Stair | YES | YES | 40.683928 | -73.978412 | POINT (-73.978412 40.683928) |
InΒ [7]:
# ---- 3) Nearest subway entrance via KDTree ----
tree = KDTree(mta_latlon, metric="euclidean")
# Query nearest neighbor for each rat sighting
rat_latlon = rats[["latitude", "longitude"]].values
dist, idx = tree.query(rat_latlon, k=1)
# Convert approx degrees to meters (NYC scale, rough): 1 degree ~ 111,111 meters
rats["Distance to Subway Entrance"] = (dist.flatten() * 111_111).astype(float)
# Log-transform for color scaling (avoid log(0))
rats["log10_dist"] = np.log10(np.maximum(rats["Distance to Subway Entrance"], 1e-3))
rats[["latitude","longitude","Distance to Subway Entrance","log10_dist"]].head(3)
Out[7]:
| latitude | longitude | Distance to Subway Entrance | log10_dist | |
|---|---|---|---|---|
| 0 | 40.692989 | -73.943771 | 1482.038542 | 3.170859 |
| 1 | 40.739983 | -73.809299 | 9969.632839 | 3.998679 |
| 2 | 40.807367 | -73.954388 | 328.194495 | 2.516131 |
InΒ [9]:
# ---- 4) Animated map ----
fig = px.scatter_mapbox(
rats,
lat="latitude",
lon="longitude",
animation_frame="Date",
animation_group="unique_key" if "unique_key" in rats.columns else None,
color="log10_dist",
hover_data={"Date": True, "latitude": False, "longitude": False, "Distance to Subway Entrance": True},
color_continuous_scale="armyrose",
range_color=[0, 5],
title="NYC Rat Sightings vs Distance to Nearest Subway Entrance"
)
# Animation timing tweaks (if frames exist)
if fig.layout.updatemenus and fig.layout.updatemenus[0].buttons:
try:
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 20
except Exception:
pass
# Colorbar labels
fig.update_layout(coloraxis_colorbar=dict(
title="Distance (approx.)",
tickvals=[4, 3, 2, 1, 0],
ticktext=["~10km", "~1km", "~100m", "~10m", "~1m"]
))
# Basemap: use Mapbox if token provided, else USGS raster tiles
if os.getenv("MAPBOX_TOKEN"):
fig.update_layout(mapbox_style="streets")
else:
fig.update_layout(
mapbox_style="white-bg",
mapbox_layers=[
{
"below": "traces",
"sourcetype": "raster",
"sourceattribution": "United States Geological Survey",
"source": [
"https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
],
}
],
)
fig.update_layout(height=700,width=1000,
margin={"r":0,"t":40,"l":0,"b":0},
mapbox=dict(
center={"lat": 40.741895, "lon": -73.989308},zoom=10))
fig.show()